Unsupervised Phoneme Segmentation Using Mahalanobis Distance

نویسندگان

  • Yu QIAO
  • Nobuaki MINEMATSU
چکیده

Abstract One of the fundamental problems in speech engineering is phoneme segmentation. Approaches to phoneme segmentation can be divided into two categories: supervised and unsupervised segmentation. The approach of this paper belongs to the 2nd category, which tries to perform phonetic segmentation without using any prior knowledge on linguistic contents and acoustic models. In an earlier work, we formulated the segmentation problem into an optimization problem through statistics and information analysis. An objective function, summation of squared error (SSE), is developed by using Euclidean distance of cepstral features. However, it is not known whether or not Euclidean distance yields the best distance metric to estimate the goodness of segmentations. A popular generalization of Euclidean distance is Mahalanobis distance (MD). In this paper, we study whether and how MD can be used to improve the performance of segmentation. The essential problem here is how to determine the parameters (covariance matrix) for MD calculation. We deal with this problem in a learning framework and propose two criteria for determining the optimal parameters: Minimum of Summation Variance (MSV) and Maximum of Discrimination Variance (MDV). MSV minimizes the summation of variance within phonemes, while MDV maximizes the variance between phonemes and minimizes the variance within phonemes at the same time. Both of them can lead to close form solutions by using matrix calculation. We also propose an algorithm to learn the parameters without using labeled data. We carried out experiments on the TIMIT database to eveluate the proposed methods. The results indicate that the use of learning MD can increase the correct recall rates. We also found the use of power can further improve the results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image Segmentation By Self Organizing Map With Mahalanobis Distance

Image segmentation is the classification of data sets into group of similar data points. This article proposed a method to determine the winner unit by self organizing mapping network. The distance between the input vector and the weight vector has been determined by mahalanobis distance and chooses the unit whose weight vector has the smallest mahalanobis distance from the input vector. The re...

متن کامل

Metric learning for unsupervised phoneme segmentation

Unsupervised phoneme segmentation aims at dividing a speech stream into phonemes without using any prior knowledge of linguistic contents and acoustic models. In [1], we formulated this problem into an optimization framework, and developed an objective function, summation of squared error (SSE) based on the Euclidean distance of cepstral features. However, it is unknown whether or not Euclidean...

متن کامل

Unsupervised Phoneme Segmentation Using Transformed Cepstrum Features

One of the basic problems in speech engineering is phoneme segmentation, that is, to divide a speech stream into a string of phonemes. Automatic Speech Recognition (ASR) models often require reliable phoneme segmentation in the initial training phase, and Text-to-Speech (TTS) systems need a large speech database with correct phoneme segmentation information for improving the performance. Human ...

متن کامل

Unsupervised Segmentation of Phoneme Sequences based on Pitman-Yor Semi-Markov Model using Phoneme Length Context

Unsupervised segmentation of phoneme sequences is an essential process to obtain unknown words during spoken dialogues. In this segmentation, an input phoneme sequence without delimiters is converted into segmented sub-sequences corresponding to words. The Pitman-Yor semi-Markov model (PYSMM) is promising for this problem, but its performance degrades when it is applied to phonemelevel word seg...

متن کامل

Unsupervised Texture Image Segmentation Using MRFEM Framework

Texture image analysis is one of the most important working realms of image processing in medical sciences and industry. Up to present, different approaches have been proposed for segmentation of texture images. In this paper, we offered unsupervised texture image segmentation based on Markov Random Field (MRF) model. First, we used Gabor filter with different parameters’ (frequency, orientatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008